In this paper, we introduce a regularized mean-field game and study learning of this game under an infinite-horizon discounted reward function. Regularization is introduced by adding a strongly concave regularization function to the one-stage reward function in the classical mean-field game model. We establish a value iteration based learning algorithm to this regularized mean-field game using fitted Q-learning. The regularization term in general makes reinforcement learning algorithm more robust to the system components. Moreover, it enables us to establish error analysis of the learning algorithm without imposing restrictive convexity assumptions on the system components, which are needed in the absence of a regularization term.
translated by 谷歌翻译
Video summarization attracts attention for efficient video representation, retrieval, and browsing to ease volume and traffic surge problems. Although video summarization mostly uses the visual channel for compaction, the benefits of audio-visual modeling appeared in recent literature. The information coming from the audio channel can be a result of audio-visual correlation in the video content. In this study, we propose a new audio-visual video summarization framework integrating four ways of audio-visual information fusion with GRU-based and attention-based networks. Furthermore, we investigate a new explainability methodology using audio-visual canonical correlation analysis (CCA) to better understand and explain the role of audio in the video summarization task. Experimental evaluations on the TVSum dataset attain F1 score and Kendall-tau score improvements for the audio-visual video summarization. Furthermore, splitting video content on TVSum and COGNIMUSE datasets based on audio-visual CCA as positively and negatively correlated videos yields a strong performance improvement over the positively correlated videos for audio-only and audio-visual video summarization.
translated by 谷歌翻译
We consider a radio resource management (RRM) problem in a multi-user wireless network, where the goal is to optimize a network-wide utility function subject to constraints on the ergodic average performance of users. We propose a state-augmented parameterization for the RRM policy, where alongside the instantaneous network states, the RRM policy takes as input the set of dual variables corresponding to the constraints. We provide theoretical justification for the feasibility and near-optimality of the RRM decisions generated by the proposed state-augmented algorithm. Focusing on the power allocation problem with RRM policies parameterized by a graph neural network (GNN) and dual variables sampled from the dual descent dynamics, we numerically demonstrate that the proposed approach achieves a superior trade-off between mean, minimum, and 5th percentile rates than baseline methods.
translated by 谷歌翻译
尖峰神经网络(SNN)提供了一个新的计算范式,能够高度平行,实时处理。光子设备是设计与SNN计算范式相匹配的高带宽,平行体系结构的理想选择。 CMO和光子元件的协整允许将低损耗的光子设备与模拟电子设备结合使用,以更大的非线性计算元件的灵活性。因此,我们在整体硅光子学(SIPH)过程上设计和模拟了光电尖峰神经元电路,该过程复制了超出泄漏的集成和火(LIF)之外有用的尖峰行为。此外,我们探索了两种学习算法,具有使用Mach-Zehnder干涉法(MZI)网格作为突触互连的片上学习的潜力。实验证明了随机反向传播(RPB)的变体,并在简单分类任务上与标准线性回归的性能相匹配。同时,将对比性HEBBIAN学习(CHL)规则应用于由MZI网格组成的模拟神经网络,以进行随机输入输出映射任务。受CHL训练的MZI网络的性能比随机猜测更好,但不符合理想神经网络的性能(没有MZI网格施加的约束)。通过这些努力,我们证明了协调的CMO和SIPH技术非常适合可扩展的SNN计算体系结构的设计。
translated by 谷歌翻译
满意度测量,在今天的每个部门都出现,是许多公司的一个非常重要的因素。在本研究中,旨在通过使用yemek Sepeti的数据和该数据的变化来达到各种机器学习算法的最高精度率。每种算法的精度值都与所使用的各种自然语言处理方法一起计算。在计算这些精度值时,尝试优化使用的算法的参数。在本研究中培训的模型可以在未标记的数据上使用,并且可以在衡量客户满意度时给公司一个想法。观察到施加的3种不同的自然语言处理方法导致大部分开发模型中的大约5%的精度增加。
translated by 谷歌翻译
目前的地震设计代码主要依赖于结构构件的强度和位移能力,并且不考虑地面运动持续时间或滞后行为特征的影响。基于能量的方法用作响应量的补充指标,包括重复载荷在地震性能中的效果。设计理念表明,结构构件的能量耗散能力满足了地震要求。因此,应当很好地理解结构构件的能量耗散行为,以实现有效的基于能量的设计方法。本研究重点介绍钢筋混凝土(RC)剪切墙的能量耗散能力,这些剪切壁广泛用于高地震区,因为它们提供了抗侧向力的显着刚度和强度。基于机器学习(高斯过程回归(GPR))的剪力墙能量耗散能力的预测模型是墙面设计参数的函数。显示十八个设计参数来影响能量耗散,而最重要的是通过施加顺序向后消除并通过使用特征选择方法来确定预测模型的复杂性来确定。所提出的模型使稳健和准确的预测的能力基于具有预测精度的新数据(预测/实际值的比率)约为1.00的新数据和0.93的确定系数(R2)。本研究的结果被认为是(i)的基于能量的方法(i)限定了剪力墙地震能量耗散能力的最有影响力的墙壁性能和(ii)提供了能够实现不同墙体设计配置的比较的预测模型实现更高的能量耗散能力。
translated by 谷歌翻译
生物重建VII轨道3挑战重点是在Twitter用户时间表中识别药物名称。对于我们提交这一挑战,我们通过使用多种数据增强技术扩展了可用的培训数据。然后,增强数据用于微调在一般域推特内容上预先培训的语言模型的集合。拟议的方法优于先前的最先进的算法Kusuri,并在竞争中排名高,为我们所选择的客观函数重叠F1分数。
translated by 谷歌翻译
在本文中,我们提出了一种一阶分布式优化算法,该算法对拜占庭式失败 - 肢体和潜在的对抗性行为非常强大,在该行为中,所有参与的药物都容易发生失败。我们随着时间的推移将每个代理的状态建模为两国马尔可夫链,该链在不同时间时指示拜占庭或可信赖的行为。我们在任何给定时间均未设置对拜占庭代理的最大数量的限制。我们根据三层防御设计我们的方法:1)时间稳健聚集,2)空间稳健聚集和3)梯度归一化。我们研究了两个用于随机优化的设置,即样品平均近似值和随机近似。我们提供了强烈凸出和平滑非凸成本功能的方法的收敛保证。
translated by 谷歌翻译
我们研究了随机线性匪徒(LB)中的两个模型选择设置。在我们将其称为特征选择的第一个设置中,LB问题的预期奖励是$ M $特征映射(模型)中至少一个的线性跨度。在第二个设置中,LB问题的奖励参数由$ \ MATHBB r ^ d $中表示(可能)重叠球的$ M $模型任意选择。但是,该代理只能访问错过模型,即球的中心和半径的估计。我们将此设置称为参数选择。对于每个设置,我们开发和分析一种基于从匪徒减少到全信息问题的算法。这允许我们获得遗憾的界限(最多超过$ \ sqrt {\ log m} $ factor)而不是已知真实模型的情况。我们参数选择算法的遗憾也以模型不确定性对数进行缩放。最后,我们经验展现了使用合成和现实世界实验的算法的有效性。
translated by 谷歌翻译
We consider learning approximate Nash equilibria for discrete-time mean-field games with nonlinear stochastic state dynamics subject to both average and discounted costs. To this end, we introduce a mean-field equilibrium (MFE) operator, whose fixed point is a mean-field equilibrium (i.e. equilibrium in the infinite population limit). We first prove that this operator is a contraction, and propose a learning algorithm to compute an approximate mean-field equilibrium by approximating the MFE operator with a random one. Moreover, using the contraction property of the MFE operator, we establish the error analysis of the proposed learning algorithm. We then show that the learned mean-field equilibrium constitutes an approximate Nash equilibrium for finite-agent games.
translated by 谷歌翻译